Multilingual Models for Compositional Distributed Semantics

نویسندگان

  • Karl Moritz Hermann
  • Phil Blunsom
چکیده

We present a novel technique for learning semantic representations, which extends the distributional hypothesis to multilingual data and joint-space embeddings. Our models leverage parallel data and learn to strongly align the embeddings of semantically equivalent sentences, while maintaining sufficient distance between those of dissimilar sentences. The models do not rely on word alignments or any syntactic information and are successfully applied to a number of diverse languages. We extend our approach to learn semantic representations at the document level, too. We evaluate these models on two cross-lingual document classification tasks, outperforming the prior state of the art. Through qualitative analysis and the study of pivoting effects we demonstrate that our representations are semantically plausible and can capture semantic relationships across languages without parallel data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed representations for compositional semantics

The mathematical representation of semantics is a key issue for Natural Language Processing (NLP). A lot of research has been devoted to finding ways of representing the semantics of individual words in vector spaces. Distributional approaches—meaning distributed representations that exploit co-occurrence statistics of large corpora—have proved popular and successful across a number of tasks. H...

متن کامل

Towards Syntax-aware Compositional Distributional Semantic Models

Compositional Distributional Semantics Models (CDSMs) are traditionally seen as an entire different world with respect to Tree Kernels (TKs). In this paper, we show that under a suitable regime these two approaches can be regarded as the same and, thus, structural information and distributional semantics can successfully cooperate in CSDMs for NLP tasks. Leveraging on distributed trees, we pres...

متن کامل

Compositional Semantics of an Actor-Based Language Using Constraint Automata

Rebeca is an actor-based language which has been successfully applied to model concurrent and distributed systems. The semantics of Rebeca in labeled transition system is not compositional. In this paper, we investigate the possibility of mapping Rebeca models into a coordination language, Reo, and present a natural mapping that provides a compositional semantics of Rebeca. To this end, we cons...

متن کامل

Linear Compositional Distributional Semantics and Structural Kernels

In this paper, we want to start the analysis of the models for compositional distributional semantics (CDS) with respect to the distributional similarity. We believe that this simple analysis of the properties of the similarity can help to better investigate new CDS models. We show that, looking at CDS models from this point of view, these models are strictly related with convolution kernels (H...

متن کامل

Bringing machine learning and compositional semantics together

Computational semantics has long been seen as a field divided between logical and statistical approaches, but this divide is rapidly eroding, with the development of statistical models that learn compositional semantic theories from corpora and databases. This paper presents a simple discriminative learning framework for defining such models and relating them to logical theories. Within this fr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014